HubPPR: Effective Indexing for Approximate Personalized PageRank

نویسندگان

  • Sibo Wang
  • Youze Tang
  • Xiaokui Xiao
  • Yin Yang
  • Zengxiang Li
چکیده

Personalized PageRank (PPR) computation is a fundamental operation in web search, social networks, and graph analysis. Given a graphG, a source s, and a target t, the PPR query π(s, t) returns the probability that a random walk on G starting from s terminates at t. Unlike global PageRank which can be effectively pre-computed and materialized, the PPR result depends on both the source and the target, rendering results materialization infeasible for large graphs. Existing indexing techniques have rather limited effectiveness; in fact, the current state-of-the-art solution, BiPPR, answers individual PPR queries without pre-computation or indexing, and yet it outperforms all previous index-based solutions. Motivated by this, we propose HubPPR, an effective indexing scheme for PPR computation with controllable tradeoffs for accuracy, query time, and memory consumption. The main idea is to pre-compute and index auxiliary information for selected hub nodes that are often involved in PPR processing. Going one step further, we extend HubPPR to answer top-k PPR queries, which returns the k nodes with the highest PPR values with respect to a source s, among a given set T of target nodes. Extensive experiments demonstrate that compared to the current best solution BiPPR, HubPPR achieves up to 10x and 220x speedup for PPR and top-k PPR processing, respectively, with moderate memory consumption. Notably, with a single commodity server, HubPPR answers a top-k PPR query in seconds on graphs with billions of edges, with high accuracy and strong result quality guarantees.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Community Detection Using Time-Dependent Personalized PageRank

Local graph diffusions have proven to be valuable tools for solving various graph clustering problems. As such, there has been much interest recently in efficient local algorithms for computing them. We present an efficient local algorithm for approximating a graph diffusion that generalizes both the celebrated personalized PageRank and its recent competitor/companion the heat kernel. Our algor...

متن کامل

Approximating Personalized PageRank with Minimal Use of Web Graph Data

In this paper, we consider the problem of calculating fast and accurate approximations to the personalized PageRank score ([8, 16]) of a webpage. We focus on techniques to improve speed by limiting the amount of webgraph data we need to access. PageRank scores are mainly used for ranking purposes, and generally only the scores exceeding a given threshold are relevant. In practice, and relative ...

متن کامل

Strong Localization in Personalized PageRank Vectors

Abstract. The personalized PageRank diffusion is a fundamental tool in network analysis tasks like community detection and link prediction. This tool models the spread of a quantity from a small, initial set of seed nodes, and has long been observed to stay localized near this seed set. We derive a sublinear upper-bound on the number of nonzeros necessary to approximate a personalized PageRank ...

متن کامل

Detecting Sharp Drops in PageRank and a Simplified Local Partitioning Algorithm

We show that whenever there is a sharp drop in the numerical rank defined by a personalized PageRank vector, the location of the drop reveals a cut with small conductance. We then show that for any cut in the graph, and for many starting vertices within that cut, an approximate personalized PageRank vector will have a sharp drop sufficient to produce a cut with conductance nearly as small as th...

متن کامل

Personalized Hitting Time for Informative Trust Mechanisms Despite Sybils

Informative and scalable trust mechanisms that are robust to manipulation by strategic agents are a critical component of multi-agent systems. While the global hitting time mechanism (GHT) introduced by Hopcroft and Sheldon [9] is more robust to manipulation than PageRank, strategic agents can still benefit significantly under GHT by performing sybil attacks. In this paper, we introduce the per...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • PVLDB

دوره 10  شماره 

صفحات  -

تاریخ انتشار 2016